Source: This material adapted from the Python for Biologists website.
A condition is simply a bit of code that can produce a true or false answer. The easiest way to understand how conditions work in Python is try out a few examples. The following example prints out the result of testing (or evaluating) a bunch of different conditions – some mathematical examples, some using string methods, and one for testing if a value is included in a list:
In [ ]:
print(3 == 5)
In [ ]:
print(3 > 5)
In [ ]:
print(3 <= 5)
In [ ]:
print(len("ATGC") > 5)
In [ ]:
print("GAATTC".count("T") > 1)
In [ ]:
print("ATGCTT".startswith("ATG"))
In [ ]:
print("ATGCTT".endswith("TTT"))
In [ ]:
print("ATGCTT".isupper())
In [ ]:
print("ATGCTT".islower())
In [ ]:
print("V" in ["V", "W", "L"])
But what’s actually being printed here? At first glance, it looks like we’re printing the strings “True” and “False”, but those strings don’t appear anywhere in our code. What is actually being printed is the special built-in values that Python uses to represent true and false – they are capitalized so that we know they’re these special values.
We can show that these values are special by trying to print them. The following code runs without errors (note the absence of quotation marks):
In [ ]:
print(True)
In [ ]:
print(False)
There’s a wide range of things that we can include in conditions, and it would be impossible to give an exhaustive list here. The basic building blocks are:
are two objects the same (represented by is).
Notice that the test for equality is two equals signs, not one. Forgetting the second equals sign will cause an error.
Now that we know how to express tests as conditions, let’s see what we can do with them.
The simplest kind of conditional statement is an if statement. Hopefully the syntax is fairly simple to understand:
In [ ]:
expression_level = 125
if expression_level > 100:
print("gene is highly expressed")
We write the word if, followed by a condition, and end the first line with a colon. There follows a block of indented lines of code (the body of the if statement), which will only be executed if the condition is true. This colon-plus-block pattern should be familiar to you from the sections on loops and functions.
Most of the time, we want to use an if statement to test a property of some variable whose value we don’t know at the time when we are writing the program. The example above is obviously useless, as the value of the expression_level variable is not going to change!
Here’s a slightly more interesting example: we’ll define a list of gene accession names and print out just the ones that start with “a”:
In [ ]:
accs = ['ab56', 'bh84', 'hv76', 'ay93', 'ap97', 'bd72']
for accession in accs:
if accession.startswith('a'):
print(accession)
If you take a close look at the code above, you’ll see something interesting – the lines of code inside the loop are indented (just as we’ve seen before), but the line of code inside the if statement is indented twice – once for the loop, and once for the if. This is the first time we’ve seen multiple levels of indentation, but it’s very common once we start working with larger programs – whenever we have one loop or if statement nested inside another, we’ll have this type of indentation.
Python is quite happy to have as many levels of indentation as needed, but you’ll need to keep careful track of which lines of code belong at which level. If you find yourself writing a piece of code that requires more than three levels of indentation, it’s generally an indication that that piece of code should be turned into a function.
Closely related to the if statement is the else statement. The examples above use a yes/no type of decision-making: should we print the gene accession number or not? Often we need an either/or type of decision, where we have two possible actions to take. To do this, we can add on an else clause after the end of the body of an if statement:
In [ ]:
expression_level = 125
if expression_level > 100:
print("gene is highly expressed")
else:
print("gene is lowly expressed")
The else statement doesn’t have any condition of its own – rather, the else statement body is execute when the if statement to which it’s attached is not executed.
Here’s an example which uses if and else to split up a list of accession names into two different files – accessions that start with “a” go into the first file, and all other accessions go into the second file:
In [ ]:
file1 = open("a_accessions.txt", "w")
file2 = open("other_accessions.txt", "w")
accs = ['ab56', 'bh84', 'hv76', 'ay93', 'ap97', 'bd72']
for accession in accs:
if accession.startswith('a'):
file1.write(accession + "\n")
else:
file2.write(accession + "\n")
Notice how there are multiple indentation levels as before, but that the if and else statements are at the same level.
What if we have more than two possible branches? For example, say we want three files of accession names: ones that start with “a”, ones that start with “b”, and all others. We could have a second if statement nested inside the else clause of the first if statement:
In [ ]:
file1 = open("a_accessions.txt", "w")
file2 = open("b_accessions.txt", "w")
file3 = open("other_accessions.txt", "w")
accs = ['ab56', 'bh84', 'hv76', 'ay93', 'ap97', 'bd72']
for accession in accs:
if accession.startswith('a'):
file1.write(accession + "\n")
else:
if accession.startswith('b'):
file2.write(accession + "\n")
else:
file3.write(accession + "\n")
This works, but is difficult to read – we can quickly see that we need an extra level of indentation for every additional choice we want to include. To get round this, Python has an elif statement, which merges together else and if and allows us to rewrite the above example in a much more elegant way:
In [ ]:
file1 = open("a_accessions.txt", "w")
file2 = open("b_accessions.txt", "w")
file3 = open("other_accessions.txt", "w")
accs = ['ab56', 'bh84', 'hv76', 'ay93', 'ap97', 'bd72']
for accession in accs:
if accession.startswith('a'):
file1.write(accession + "\n")
elif accession.startswith('b'):
file2.write(accession + "\n")
else:
file3.write(accession + "\n")
Notice how this version of the code only needs two levels of indention. In fact, using elif we can have any number of branches and still only require a single extra level of indentation:
In [ ]:
for accession in accs:
if accession.startswith('a'):
file1.write(accession + "\n")
elif accession.startswith('b'):
file2.write(accession + "\n")
elif accession.startswith('c'):
file3.write(accession + "\n")
elif accession.startswith('d'):
file4.write(accession + "\n")
elif accession.startswith('e'):
file5.write(accession + "\n")
else:
file6.write(accession + "\n")
Another way of handling complex decision branches like this – especially useful when dealing with validation and errors – is using exceptions, which have their own chapter in Advanced Python for Biologists.
Here’s one final thing we can do with conditions: use them to determine when to exit a loop. In section 4 we learned about loops that iterate over a collection of items (like a list, a string or a file). Python has another type of loop called a while loop. Rather than running a set number of times, a while loop runs until some condition is met. For example, here’s a bit of code that increments a count variable by one each time round the loop, stopping when the count variable reaches ten:
In [ ]:
count = 0
while count<10:
print(count)
count = count + 1
Because normal loops in Python are so powerful2 , while loops are used much less frequently than in other languages, so we won’t discuss them further.
What if we wanted to express a condition that was made up of several parts? Imagine we want to go through our list of accessions and print out only the ones that start with “a” and end with “3”. We could use two nested if statements:
In [ ]:
accs = ['ab56', 'bh84', 'hv76', 'ay93', 'ap97', 'bd72']
for accession in accs:
if accession.startswith('a'):
if accession.endswith('3'):
print(accession)
but this brings in an extra, unneeded level of indention. A better way is to join up the two condition with and to make a complex expression:
In [ ]:
accs = ['ab56', 'bh84', 'hv76', 'ay93', 'ap97', 'bd72']
for accession in accs:
if accession.startswith('a') and accession.endswith('3'):
print(accession)
This version is nicer in two ways: it doesn’t require the extra level of indentation, and the condition reads in a very natural way. We can also use or to join up two conditions, to produce a complex condition that will be true if either of the two simple conditions are true:
In [ ]:
accs = ['ab56', 'bh84', 'hv76', 'ay93', 'ap97', 'bd72']
for accession in accs:
if accession.startswith('a') or accession.startswith('b'):
print(accession)
We can even join up complex conditions to make more complex conditions – here’s an example which prints accessions if they start with either “a” or “b” and end with “4”:
In [ ]:
accs = ['ab56', 'bh84', 'hv76', 'ay93', 'ap97', 'bd72']
for acc in accs:
if (acc.startswith('a') or acc.startswith('b')) and acc.endswith('4'):
print(acc)
Notice how we have to include parentheses in the above example to avoid ambiguity. Finally, we can negate any type of condition by prefixing it with the word not. This example will print out accessions that start with “a” and don’t end with 6:
In [ ]:
accs = ['ab56', 'bh84', 'hv76', 'ay93', 'ap97', 'bd72']
for acc in accs:
if acc.startswith('a') and not acc.endswith('6'):
print(acc)
By using a combination of and, or and not (along with parentheses where necessary) we can build up arbitrarily complex conditions. This kind of use for conditions – identifying elements in a list – can often be done better using either the filter function, or a list comprehension.
These three words are collectively known as boolean operators and crop up in a lot of places. For example, if you wanted to search for information on using Python in biology, but didn’t want to see pages that talked about biology of snakes, you might do a search for “biology python -snake“. This is actually a complex condition just like the ones above – Google automatically adds and between words, and uses the hyphen to mean not. So you’re asking for pages that mention python and biology but not snakes.
Sometimes we want to write a function that can be used in a condition. This is very easy to do – we just make sure that our function always returns either True or False. Remember that True and False are built-in values in Python, so they can be passed around, stored in variables, and returned, just like numbers or strings.
Here’s a function that determines whether or not a DNA sequence is AT-rich (we’ll say that a sequence is AT-rich if it has an AT content of more than 0.65):
In [ ]:
def is_at_rich(dna):
length = len(dna)
a_count = dna.upper().count('A')
t_count = dna.upper().count('T')
at_content = (a_count + t_count) / length
if at_content > 0.65:
return True
else:
return False
We’ll test this function on a few sequences to see if it works:
In [ ]:
print(is_at_rich("ATTATCTACTA"))
print(is_at_rich("CGGCAGCGCT"))
The output shows that the function returns True or False just like the other conditions we’ve been looking at:
True
False
Therefore we can use our function in an if statement:
In [ ]:
if is_at_rich(my_dna):
# do something with the sequence
Because the last four lines of our function are devoted to evaluating a condition and returning True or False, we can write a slightly more compact version. In this example we evaluate the condition, and then return the result right away:
In [ ]:
def is_at_rich(dna):
length = len(dna)
a_count = dna.upper().count('A')
t_count = dna.upper().count('T')
at_content = (a_count + t_count) / length
return at_content > 0.65
This is a little more concise, and also easier to read once you’re familiar with the idiom.
In this short section, we’ve dealt with two things: conditions, and the statements that use them.
We’ve seen how simple conditions can be joined together to make more complex ones, and how the concepts of truth and falsehood are built in to Python on a fundamental level. We’ve also seen how we can incorporate True and False in our own functions in a way that allows them to be used as part of conditions.
We’ve been introduced to four different tools that use conditions – if, else, elif, and while – in approximate order of usefulness. You’ll probably find, in the programs that you write and in your solutions to the exercises in this book, that you use if and else very frequently, elif occasionally, and while almost never.
In [ ]: